[core] Record latency of successful and failed operations separately #507

ghost · 2015-11-22T17:01:18Z

Plus some minor styling fixes: break long lines, replace tabs with white spaces.

ghost · 2015-11-22T17:01:31Z

This is for issue #506
@busbey PTAL, thanks!

ghost · 2015-11-22T17:09:17Z

Note above, the reasons for travis-ci build failures are NOT related to this PR, it's this one:

[ERROR] Failed to execute goal on project couchbase-binding: Could not resolve dependencies for project com.yahoo.ycsb:couchbase-binding:jar:0.6.0-SNAPSHOT: Could not transfer artifact org.apache.httpcomponents:httpcore:jar:4.1.1 from/to central (http://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:80 [repo.maven.apache.org/199.27.76.215] failed: Connection timed out -> [Help 1]

Some kind of connection failure from the machine running travis build to maven. And similar connection error is seen from the travis machine to github.com

[ERROR] Failed to execute goal on project tarantool-binding: Could not resolve dependencies for project com.yahoo.ycsb:tarantool-binding:jar:0.6.0-SNAPSHOT: Failed to collect dependencies at org.tarantool:connector:jar:1.6.1: Failed to read artifact descriptor for org.tarantool:connector:jar:1.6.1: Could not transfer artifact org.tarantool:connector:pom:1.6.1 from/to dgreenru-repo (http://dgreenru.github.com/repo/): Connect to dgreenru.github.com:80 [dgreenru.github.com/199.27.76.133] failed: Connection timed out -> [Help 1]

Looks like either a transient problem, or problem with networking on the build machine.

ghost · 2015-11-22T17:11:45Z

One more quick note:

When viewing diffs for this change, it's best to view it on command line with git diff -w option, the web UI shows all diffs including white spaces. Since I made some minor styling changes that removed the Tabs and trailing white spaces and unnecessary empty lines, the diff becomes noisy if you view it with the white spaces included. The actual code diff is really as simple as this below:

if (res == Status.OK) {
```
 measure("READ", ist, st, en);
```
} else {
```
 measure("READ-FAILED", ist, st, en);
```
}

The original code only has measure("READ", ist, st, en); and new code added the if-else check.

allanbank · 2015-11-22T18:59:06Z

@sffeng -

I was thinking.... (Always dangerous!)

If we change this to:

  measure("READ-" + res.getName(), ist, st, en);

Then if we added a "DEFERRED" Status, for those bindings that support it, we would get metrics on operations that the client has not sent to the server yet and a different set on when they actually flush the operations to the server.

Thoughts?

Rob.

kruthar · 2015-11-22T22:37:57Z

@allanbank - measuring each error type individually is an interesting idea, although if we go down that route it might be best to have a configuration to turn that on, or default to consolidating error measurements, or even to just not report error measurements. My concern is that some bindings have several different error types that could get reported. With long running timeseries workloads you could possibly get buried in timeseries printouts if you hit several log types.

ghost · 2015-11-22T23:10:20Z

I see valid points for both sides, so how about make latency tracking for any "non-OK" status configurable? that is, for any non-ok status we only track latency if it is configured via a property. The property can be a simple CSV of a list of non-ok status the user wants to track.

I believe this should address Allan's use case where the user may want to track the latency for Status.Deferred meanwhile keep the output simple and concise for most users who don't really care about latency of non-OK operations.

In terms of code, it will look something like this, still fairly simple:

void init() {
String nonOKStatusTracking = Properties.getProperty("extra-status-tracking", "");
Set trackedNonOKStatus = new HashSet(Arrays.asList(nonOKStatusTracking.split(",")));
}

Then when each time we need to report latency, we do this:

if (res == Status.OK) {
measure("READ", ist, st, en);
} else if (trackedNonOKStatus.contains(res.getName()) {
measure("READ-" + res.getName(), ist, st, en);
}

allanbank · 2015-11-22T23:29:42Z

@stfeng - I like the idea of making it configurable.

Should there be a final else block with something like "READ-FAILED"?

Rob

ghost · 2015-11-22T23:39:49Z

Yes. I think there should be a final else block to catch all failed status.

Thank you Rob and Andy for your inputs! I will make a follow-up commit on
Monday to reflect this discussion.

-Stanley

On Sunday, November 22, 2015, allanbank [email protected] wrote:

@stfeng https://github.com/stfeng - I like the idea of making it
configurable.

Should there be a final else block with something like "READ-FAILED"?

Rob

—
Reply to this email directly or view it on GitHub
#507 (comment)
.

kruthar · 2015-11-23T02:13:46Z

So, I'm actually leaning to just a configuration to do one of three things:

report all errors - reports all error types separately
report consolidated errors - reports all errors under a single measurement name
don't report errors - default

Reason being, a philosophy I try to have is to make users have to sift through source as little as possible. Providing the user with the ability to specify specific error codes to track is definitely extremely flexible, but would require each module to document all of its possible error codes so people don't have to go searching for them in source. This is not impossible, just a lot more work, and more work to maintain.

Thoughts? Maybe a combination of both? Sorry @stfeng making your simple PR a bit more complicated.

ghost · 2015-11-23T05:12:02Z

@kruthar, Actually the latest proposed code change already does pretty much the same as you proposed above, and is quite simple for the user, let's take a look at the proposed code first:

if (res == Status.OK) {
measure("READ", ist, st, en);
} else if (trackedNonOKStatus.contains(res.getName()) {
measure("READ-" + res.getName(), ist, st, en);
} else {
measure("READ-failed", ist, st, en);
}

"trackedNonOKStatus" is from configuration and by default is empty (i.e., user does not have to configure anything). So the default behavior is to report latency of all OK and all failed operations separately in two measurement names, which maps to your proposal 2) and 3). (If the user doesn't care about latency of failed operations, they can simply ignore that category).

In practice, most users don't care about latency of failed operations. however, there are advanced users who care about latency of a specific (set of) failed operation(s), such as the latency of "deferred status" operations. For them, they can configure those status code in the "trackedNonOKStatus" property and we will report these configured status code separately. Such configuration is only required for users who care about a particular set of failed status code, and even for them, they do not need to list all possible status code a DB could return, they only need to configure the ones they already know about.

I think the difference between your proposal and mine is you are proposing to have a mode that will report each and every error code separately. That could lead to unnecessary noise, but I think it's easy to add that mode here just in case it's useful to someone. One extra condition in the else-if clause would do the job:
...
else if (reportLatencyForAllErrorCodes == true || trackedNonOKStatus.contains(res.getName()) {
measure("READ-" + res.getName(), ist, st, en);
} else {
...
(and by default reportLatencyForAllErrorCodes is false)

busbey · 2015-11-23T16:50:43Z

else if (reportLatencyForAllErrorCodes == true || trackedNonOKStatus.contains(res.getName()) {
measure("READ-" + res.getName(), ist, st, en);
} else {

Could we make a special status for trackedNonOKStatus, e.g. "all", rather than introduce an additional flag?

ghost · 2015-11-23T17:08:44Z

I also thought about introducing a special status ("all") but was worried it may conflict with a legitimate return status with the same name from a (future) DB binding. I believe "Status.*" should be used solely for the purpose of relaying status from DB binding to YCSB core. It seems to me an explicit flag makes the intention more clear here and the code more robust.

However, if you guys feel strongly about reserving and using the "all" string as a special token for this purpose, just let me know, i am fine making the change that way too. Thanks.

ghost · 2015-11-24T00:56:30Z

Folks, I've pushed a change that incorporates the latest discussions we had.

I've tested it on my end and it works fine... From this point on, YCSB will no longer mix up latency of failed operations with those successful operations! ;) And, by default with no configuration required on the user's part, latency of failed operations will be aggregately reported in the "[OPERATION-Failed]" measurement name, or optionally, the user can choose to configure to record latency of failed operations in a finer grain with either: recording each error code separately, or pick and choose specific error codes they want to record separately. Anyway, the code speaks for it self, PTAL. ;)

Thanks!

ghost · 2015-11-24T16:52:08Z

@allanbank @kruthar @busbey - Could any one of you guys please take a look at this? The discussion is long but the code change is still pretty simple. ;)

Thank you!

kruthar · 2015-11-25T02:03:46Z

@stfeng - We have separate issues and PRs for fixing checkstyle issues like spacing. I know it makes it harder to develop if line spacing is off, but when you mix a lot of format fixes with functionality fixes it makes it hard to evaluate the code, can you see if making just the functionality fixes related to this PR is possible?

busbey · 2015-11-25T02:27:22Z

ah dang. my earlier review comment got eaten. I'll repost.

Reviewers can look at teh total changes while ignoring whitespace with "?w=1" on the url.

eg:

https://github.com/brianfrankcooper/YCSB/pull/507/files?w=1

That said, can you please make this PR 2 commits, one with all the formatting fixes and then one with the functionality change? Both commits should use the "[core]" message prefix.

busbey · 2015-11-25T02:30:47Z

core/src/main/java/com/yahoo/ycsb/DBWrapper.java

+  //    setting reportLatencyForEachError to true, or
+  // 2) Record and report latency for a select set of error status codes by
+  //    providing a CSV list of Status codes via the "latencytrackederrors"
+  //    property.


this explanation needs to be in docs somewhere. probably the workload template and the Core Properties wiki page.

Given the current limitations of our github setup, could you add something to the workload template?

Done. Moved the workload_template.

ghost · 2015-11-25T03:56:19Z

Based on feedback above, I've separated out my original commits into two:

First one is for coding style change only, and
The second one is the actual functionality change.

Then on top of these two commits, I made the 3rd commit that incorporated @busbey's CR comments.

PTAL.

sffeng · 2015-11-25T04:00:13Z

Dear Gentleman,

Please take me off of this thread -- I believe my userID got mixed up --
sffeng vs stfeng.

Thanks

On Wed, Nov 25, 2015 at 7:56 AM, Stanley Feng [email protected]
wrote:

Based on feedback above, I've separated out my original commits into two:

First one is for coding style change only, and
The second one is the actual functionality change.

Then on top of these two commits, I made the 3rd commit that incorporated
@busbey https://github.com/busbey's CR comments.

PTAL.

—
Reply to this email directly or view it on GitHub
#507 (comment)
.

Specifically: 1. Remove trailing spaces, 2. Replace TAB with spaces, 3. Break lines > 80 characters 4. Misc code alignment. No functional code change. use ?w=1 on the URL to view non-whitespace changes.

And make it configurable.

ghost · 2015-11-25T04:14:35Z

Sorry sffeng. I believe that's a typo. :)

@allanbank looks like you referred to sffeng in one of your comments above, because only you can modify/edit your own comments, can you please change it so sffeng is no longer getting these updates? Thanks! (i am @stfeng ;)

sffeng · 2015-11-25T04:15:45Z

No problem, best wishes to your project!

On Wed, Nov 25, 2015 at 8:14 AM, Stanley Feng [email protected]
wrote:

Sorry @sffeng https://github.com/sffeng. I believe that's a typo. :)

@allanbank https://github.com/allanbank looks like you referred to
@sffeng https://github.com/sffeng in one of your comments above,
because only you can modify/edit your own comments, can you please change
it so @sffeng https://github.com/sffeng is no longer getting these
updates? Thanks! (i am @stfeng https://github.com/stfeng ;)

—
Reply to this email directly or view it on GitHub
#507 (comment)
.

- Move the doc (code comment) to workload_template - Minor changes in propery parsing logic and log to System.Err; [core] Incoporate CR comment Missed one comment before.

ghost · 2015-11-25T04:24:37Z

Rebased to master, and pushed my commits that incorporated the CR comments, including the request to separate the styling change from the functional change into different commits.

PTAL. Thanks.

busbey · 2015-11-25T04:26:35Z

I think sffeng needs to click on the "unsubscribe" button in the notifications area near the top of the page.

busbey · 2015-11-25T04:31:03Z

looks fine by me. other folks?

kruthar · 2015-11-25T18:52:47Z

core/src/main/java/com/yahoo/ycsb/DBWrapper.java

+    _db.init();
+
+    this.reportLatencyForEachError = Boolean.parseBoolean(getProperties().
+        getProperty("reportlatencyforeacherror", "false"));


Can you pull the property string and default values into variables? You can see this convention in CoreWorkload: https://github.com/brianfrankcooper/YCSB/blob/master/core/src/main/java/com/yahoo/ycsb/workloads/CoreWorkload.java#L110-L117

Define constants for the property strings.

ghost · 2015-11-25T19:20:16Z

Done incorporating @kruthar's feedback. Thanks for the reviews!

[core] Record latency of successful and failed operations separately

busbey · 2015-11-27T22:48:57Z

Thanks!

Note to next RM, call this out as incompatible.

…fixes [core] Record latency of successful and failed operations separately

ghost mentioned this pull request Nov 22, 2015

YCSB should not record latency data from failed operations #506

Closed

ghost changed the title ~~Record latency of successful and failed operations separately~~ [core] Record latency of successful and failed operations separately Nov 24, 2015

busbey reviewed Nov 25, 2015
View reviewed changes

stfeng2 added 2 commits November 24, 2015 20:10

[core] Fix some coding style issues in DBWrapper.java

66c2b6c

Specifically: 1. Remove trailing spaces, 2. Replace TAB with spaces, 3. Break lines > 80 characters 4. Misc code alignment. No functional code change. use ?w=1 on the URL to view non-whitespace changes.

[core] Record latency of successful and failed operations separately

d4ff9ec

And make it configurable.

[core] Incorporate CR feedback

3e3d401

- Move the doc (code comment) to workload_template - Minor changes in propery parsing logic and log to System.Err; [core] Incoporate CR comment Missed one comment before.

kruthar reviewed Nov 25, 2015
View reviewed changes

[core] Incorporate CR feedback (2)

78563ec

Define constants for the property strings.

busbey added a commit that referenced this pull request Nov 27, 2015

Merge pull request #507 from stfeng/stfeng-ycsb-core-fixes

8e9abc4

[core] Record latency of successful and failed operations separately

busbey merged commit 8e9abc4 into brianfrankcooper:master Nov 27, 2015

bigbes mentioned this pull request Dec 20, 2015

Release version 0.6.0 #542

Closed

jaricftw pushed a commit to jaricftw/YCSB that referenced this pull request Jul 19, 2016

Merge pull request brianfrankcooper#507 from stfeng/stfeng-ycsb-core-…

79d3edb

…fixes [core] Record latency of successful and failed operations separately

jaricftw pushed a commit to jaricftw/YCSB that referenced this pull request Jul 19, 2016

Merge pull request brianfrankcooper#507 from stfeng/stfeng-ycsb-core-…

da14817

…fixes [core] Record latency of successful and failed operations separately

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Record latency of successful and failed operations separately #507

[core] Record latency of successful and failed operations separately #507

ghost commented Nov 22, 2015

ghost commented Nov 22, 2015

ghost commented Nov 22, 2015

ghost commented Nov 22, 2015

allanbank commented Nov 22, 2015

kruthar commented Nov 22, 2015

ghost commented Nov 22, 2015

allanbank commented Nov 22, 2015

ghost commented Nov 22, 2015

kruthar commented Nov 23, 2015

ghost commented Nov 23, 2015

busbey commented Nov 23, 2015

ghost commented Nov 23, 2015

ghost commented Nov 24, 2015

ghost commented Nov 24, 2015

kruthar commented Nov 25, 2015

busbey commented Nov 25, 2015

busbey Nov 25, 2015

ghost Nov 25, 2015

ghost commented Nov 25, 2015

sffeng commented Nov 25, 2015

ghost commented Nov 25, 2015

sffeng commented Nov 25, 2015

ghost commented Nov 25, 2015

busbey commented Nov 25, 2015

busbey commented Nov 25, 2015

kruthar Nov 25, 2015

ghost Nov 25, 2015

ghost commented Nov 25, 2015

busbey commented Nov 27, 2015

[core] Record latency of successful and failed operations separately #507

[core] Record latency of successful and failed operations separately #507

Conversation

ghost commented Nov 22, 2015

ghost commented Nov 22, 2015

ghost commented Nov 22, 2015

ghost commented Nov 22, 2015

allanbank commented Nov 22, 2015

kruthar commented Nov 22, 2015

ghost commented Nov 22, 2015

allanbank commented Nov 22, 2015

ghost commented Nov 22, 2015

kruthar commented Nov 23, 2015

ghost commented Nov 23, 2015

busbey commented Nov 23, 2015

ghost commented Nov 23, 2015

ghost commented Nov 24, 2015

ghost commented Nov 24, 2015

kruthar commented Nov 25, 2015

busbey commented Nov 25, 2015

busbey Nov 25, 2015

Choose a reason for hiding this comment

ghost Nov 25, 2015

Choose a reason for hiding this comment

ghost commented Nov 25, 2015

sffeng commented Nov 25, 2015

ghost commented Nov 25, 2015

sffeng commented Nov 25, 2015

ghost commented Nov 25, 2015

busbey commented Nov 25, 2015

busbey commented Nov 25, 2015

kruthar Nov 25, 2015

Choose a reason for hiding this comment

ghost Nov 25, 2015

Choose a reason for hiding this comment

ghost commented Nov 25, 2015

busbey commented Nov 27, 2015