Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Record latency of successful and failed operations separately #507

Merged
merged 4 commits into from Nov 27, 2015
Merged

[core] Record latency of successful and failed operations separately #507

merged 4 commits into from Nov 27, 2015

Conversation

ghost
Copy link

@ghost ghost commented Nov 22, 2015

Plus some minor styling fixes: break long lines, replace tabs with white spaces.

@ghost
Copy link
Author

ghost commented Nov 22, 2015

This is for issue #506
@busbey PTAL, thanks!

@ghost
Copy link
Author

ghost commented Nov 22, 2015

Note above, the reasons for travis-ci build failures are NOT related to this PR, it's this one:

[ERROR] Failed to execute goal on project couchbase-binding: Could not resolve dependencies for project com.yahoo.ycsb:couchbase-binding:jar:0.6.0-SNAPSHOT: Could not transfer artifact org.apache.httpcomponents:httpcore:jar:4.1.1 from/to central (http://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:80 [repo.maven.apache.org/199.27.76.215] failed: Connection timed out -> [Help 1]

Some kind of connection failure from the machine running travis build to maven. And similar connection error is seen from the travis machine to github.com

[ERROR] Failed to execute goal on project tarantool-binding: Could not resolve dependencies for project com.yahoo.ycsb:tarantool-binding:jar:0.6.0-SNAPSHOT: Failed to collect dependencies at org.tarantool:connector:jar:1.6.1: Failed to read artifact descriptor for org.tarantool:connector:jar:1.6.1: Could not transfer artifact org.tarantool:connector:pom:1.6.1 from/to dgreenru-repo (http://dgreenru.github.com/repo/): Connect to dgreenru.github.com:80 [dgreenru.github.com/199.27.76.133] failed: Connection timed out -> [Help 1]

Looks like either a transient problem, or problem with networking on the build machine.

@ghost
Copy link
Author

ghost commented Nov 22, 2015

One more quick note:

When viewing diffs for this change, it's best to view it on command line with git diff -w option, the web UI shows all diffs including white spaces. Since I made some minor styling changes that removed the Tabs and trailing white spaces and unnecessary empty lines, the diff becomes noisy if you view it with the white spaces included. The actual code diff is really as simple as this below:

  • if (res == Status.OK) {
  •  measure("READ", ist, st, en);
    
  • } else {
  •  measure("READ-FAILED", ist, st, en);
    
  • }

The original code only has measure("READ", ist, st, en); and new code added the if-else check.

@allanbank
Copy link
Collaborator

@sffeng -

I was thinking.... (Always dangerous!)

If we change this to:

  measure("READ-" + res.getName(), ist, st, en);

Then if we added a "DEFERRED" Status, for those bindings that support it, we would get metrics on operations that the client has not sent to the server yet and a different set on when they actually flush the operations to the server.

Thoughts?

Rob.

@kruthar
Copy link
Collaborator

kruthar commented Nov 22, 2015

@allanbank - measuring each error type individually is an interesting idea, although if we go down that route it might be best to have a configuration to turn that on, or default to consolidating error measurements, or even to just not report error measurements. My concern is that some bindings have several different error types that could get reported. With long running timeseries workloads you could possibly get buried in timeseries printouts if you hit several log types.

@ghost
Copy link
Author

ghost commented Nov 22, 2015

I see valid points for both sides, so how about make latency tracking for any "non-OK" status configurable? that is, for any non-ok status we only track latency if it is configured via a property. The property can be a simple CSV of a list of non-ok status the user wants to track.

I believe this should address Allan's use case where the user may want to track the latency for Status.Deferred meanwhile keep the output simple and concise for most users who don't really care about latency of non-OK operations.

In terms of code, it will look something like this, still fairly simple:

void init() {
String nonOKStatusTracking = Properties.getProperty("extra-status-tracking", "");
Set trackedNonOKStatus = new HashSet(Arrays.asList(nonOKStatusTracking.split(",")));
}

Then when each time we need to report latency, we do this:

if (res == Status.OK) {
measure("READ", ist, st, en);
} else if (trackedNonOKStatus.contains(res.getName()) {
measure("READ-" + res.getName(), ist, st, en);
}

@allanbank
Copy link
Collaborator

@stfeng - I like the idea of making it configurable.

Should there be a final else block with something like "READ-FAILED"?

Rob

@ghost
Copy link
Author

ghost commented Nov 22, 2015

Yes. I think there should be a final else block to catch all failed status.

Thank you Rob and Andy for your inputs! I will make a follow-up commit on
Monday to reflect this discussion.

-Stanley

On Sunday, November 22, 2015, allanbank [email protected] wrote:

@stfeng https://github.com/stfeng - I like the idea of making it
configurable.

Should there be a final else block with something like "READ-FAILED"?

Rob


Reply to this email directly or view it on GitHub
#507 (comment)
.

@kruthar
Copy link
Collaborator

kruthar commented Nov 23, 2015

So, I'm actually leaning to just a configuration to do one of three things:

  1. report all errors - reports all error types separately
  2. report consolidated errors - reports all errors under a single measurement name
  3. don't report errors - default

Reason being, a philosophy I try to have is to make users have to sift through source as little as possible. Providing the user with the ability to specify specific error codes to track is definitely extremely flexible, but would require each module to document all of its possible error codes so people don't have to go searching for them in source. This is not impossible, just a lot more work, and more work to maintain.

Thoughts? Maybe a combination of both? Sorry @stfeng making your simple PR a bit more complicated.

@ghost
Copy link
Author

ghost commented Nov 23, 2015

@kruthar, Actually the latest proposed code change already does pretty much the same as you proposed above, and is quite simple for the user, let's take a look at the proposed code first:

if (res == Status.OK) {
measure("READ", ist, st, en);
} else if (trackedNonOKStatus.contains(res.getName()) {
measure("READ-" + res.getName(), ist, st, en);
} else {
measure("READ-failed", ist, st, en);
}

"trackedNonOKStatus" is from configuration and by default is empty (i.e., user does not have to configure anything). So the default behavior is to report latency of all OK and all failed operations separately in two measurement names, which maps to your proposal 2) and 3). (If the user doesn't care about latency of failed operations, they can simply ignore that category).

In practice, most users don't care about latency of failed operations. however, there are advanced users who care about latency of a specific (set of) failed operation(s), such as the latency of "deferred status" operations. For them, they can configure those status code in the "trackedNonOKStatus" property and we will report these configured status code separately. Such configuration is only required for users who care about a particular set of failed status code, and even for them, they do not need to list all possible status code a DB could return, they only need to configure the ones they already know about.

I think the difference between your proposal and mine is you are proposing to have a mode that will report each and every error code separately. That could lead to unnecessary noise, but I think it's easy to add that mode here just in case it's useful to someone. One extra condition in the else-if clause would do the job:
...
else if (reportLatencyForAllErrorCodes == true || trackedNonOKStatus.contains(res.getName()) {
measure("READ-" + res.getName(), ist, st, en);
} else {
...
(and by default reportLatencyForAllErrorCodes is false)

@busbey
Copy link
Collaborator

busbey commented Nov 23, 2015

else if (reportLatencyForAllErrorCodes == true || trackedNonOKStatus.contains(res.getName()) {
measure("READ-" + res.getName(), ist, st, en);
} else {

Could we make a special status for trackedNonOKStatus, e.g. "all", rather than introduce an additional flag?

@ghost
Copy link
Author

ghost commented Nov 23, 2015

I also thought about introducing a special status ("all") but was worried it may conflict with a legitimate return status with the same name from a (future) DB binding. I believe "Status.*" should be used solely for the purpose of relaying status from DB binding to YCSB core. It seems to me an explicit flag makes the intention more clear here and the code more robust.

However, if you guys feel strongly about reserving and using the "all" string as a special token for this purpose, just let me know, i am fine making the change that way too. Thanks.

@ghost
Copy link
Author

ghost commented Nov 24, 2015

Folks, I've pushed a change that incorporates the latest discussions we had.

I've tested it on my end and it works fine... From this point on, YCSB will no longer mix up latency of failed operations with those successful operations! ;) And, by default with no configuration required on the user's part, latency of failed operations will be aggregately reported in the "[OPERATION-Failed]" measurement name, or optionally, the user can choose to configure to record latency of failed operations in a finer grain with either: recording each error code separately, or pick and choose specific error codes they want to record separately. Anyway, the code speaks for it self, PTAL. ;)

Thanks!

@ghost ghost changed the title Record latency of successful and failed operations separately [core] Record latency of successful and failed operations separately Nov 24, 2015
@ghost
Copy link
Author

ghost commented Nov 24, 2015

@allanbank @kruthar @busbey - Could any one of you guys please take a look at this? The discussion is long but the code change is still pretty simple. ;)

Thank you!

@kruthar
Copy link
Collaborator

kruthar commented Nov 25, 2015

@stfeng - We have separate issues and PRs for fixing checkstyle issues like spacing. I know it makes it harder to develop if line spacing is off, but when you mix a lot of format fixes with functionality fixes it makes it hard to evaluate the code, can you see if making just the functionality fixes related to this PR is possible?

@busbey
Copy link
Collaborator

busbey commented Nov 25, 2015

ah dang. my earlier review comment got eaten. I'll repost.

Reviewers can look at teh total changes while ignoring whitespace with "?w=1" on the url.

eg:

https://github.com/brianfrankcooper/YCSB/pull/507/files?w=1

That said, can you please make this PR 2 commits, one with all the formatting fixes and then one with the functionality change? Both commits should use the "[core]" message prefix.

// setting reportLatencyForEachError to true, or
// 2) Record and report latency for a select set of error status codes by
// providing a CSV list of Status codes via the "latencytrackederrors"
// property.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this explanation needs to be in docs somewhere. probably the workload template and the Core Properties wiki page.

Given the current limitations of our github setup, could you add something to the workload template?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved the workload_template.

@ghost
Copy link
Author

ghost commented Nov 25, 2015

Based on feedback above, I've separated out my original commits into two:

First one is for coding style change only, and
The second one is the actual functionality change.

Then on top of these two commits, I made the 3rd commit that incorporated @busbey's CR comments.

PTAL.

@sffeng
Copy link

sffeng commented Nov 25, 2015

​Dear Gentleman,

Please take me off of this thread -- I believe my userID got mixed up --
sffeng vs stfeng.

Thanks​

On Wed, Nov 25, 2015 at 7:56 AM, Stanley Feng [email protected]
wrote:

Based on feedback above, I've separated out my original commits into two:

First one is for coding style change only, and
The second one is the actual functionality change.

Then on top of these two commits, I made the 3rd commit that incorporated
@busbey https://github.com/busbey's CR comments.

PTAL.


Reply to this email directly or view it on GitHub
#507 (comment)
.

Specifically:

1. Remove trailing spaces,
2. Replace TAB with spaces,
3. Break lines > 80 characters
4. Misc code alignment.

No functional code change. use ?w=1 on the URL to view non-whitespace changes.
@ghost
Copy link
Author

ghost commented Nov 25, 2015

Sorry sffeng. I believe that's a typo. :)

@allanbank looks like you referred to sffeng in one of your comments above, because only you can modify/edit your own comments, can you please change it so sffeng is no longer getting these updates? Thanks! (i am @stfeng ;)

@sffeng
Copy link

sffeng commented Nov 25, 2015

No problem, best wishes to your project!

On Wed, Nov 25, 2015 at 8:14 AM, Stanley Feng [email protected]
wrote:

Sorry @sffeng https://github.com/sffeng. I believe that's a typo. :)

@allanbank https://github.com/allanbank looks like you referred to
@sffeng https://github.com/sffeng in one of your comments above,
because only you can modify/edit your own comments, can you please change
it so @sffeng https://github.com/sffeng is no longer getting these
updates? Thanks! (i am @stfeng https://github.com/stfeng ;)


Reply to this email directly or view it on GitHub
#507 (comment)
.

- Move the doc (code comment) to workload_template
- Minor changes in propery parsing logic and log to System.Err;

[core] Incoporate CR comment

Missed one comment before.
@ghost
Copy link
Author

ghost commented Nov 25, 2015

Rebased to master, and pushed my commits that incorporated the CR comments, including the request to separate the styling change from the functional change into different commits.

PTAL. Thanks.

@busbey
Copy link
Collaborator

busbey commented Nov 25, 2015

I think sffeng needs to click on the "unsubscribe" button in the notifications area near the top of the page.

@busbey
Copy link
Collaborator

busbey commented Nov 25, 2015

looks fine by me. other folks?

_db.init();

this.reportLatencyForEachError = Boolean.parseBoolean(getProperties().
getProperty("reportlatencyforeacherror", "false"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pull the property string and default values into variables? You can see this convention in CoreWorkload: https://github.com/brianfrankcooper/YCSB/blob/master/core/src/main/java/com/yahoo/ycsb/workloads/CoreWorkload.java#L110-L117

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Define constants for the property strings.
@ghost
Copy link
Author

ghost commented Nov 25, 2015

Done incorporating @kruthar's feedback. Thanks for the reviews!

busbey added a commit that referenced this pull request Nov 27, 2015
[core] Record latency of successful and failed operations separately
@busbey busbey merged commit 8e9abc4 into brianfrankcooper:master Nov 27, 2015
@busbey
Copy link
Collaborator

busbey commented Nov 27, 2015

Thanks!

Note to next RM, call this out as incompatible.

@bigbes bigbes mentioned this pull request Dec 20, 2015
jaricftw pushed a commit to jaricftw/YCSB that referenced this pull request Jul 19, 2016
…fixes

[core] Record latency of successful and failed operations separately
jaricftw pushed a commit to jaricftw/YCSB that referenced this pull request Jul 19, 2016
…fixes

[core] Record latency of successful and failed operations separately
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants