-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Record latency of successful and failed operations separately #507
Conversation
Note above, the reasons for travis-ci build failures are NOT related to this PR, it's this one: [ERROR] Failed to execute goal on project couchbase-binding: Could not resolve dependencies for project com.yahoo.ycsb:couchbase-binding:jar:0.6.0-SNAPSHOT: Could not transfer artifact org.apache.httpcomponents:httpcore:jar:4.1.1 from/to central (http://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:80 [repo.maven.apache.org/199.27.76.215] failed: Connection timed out -> [Help 1] Some kind of connection failure from the machine running travis build to maven. And similar connection error is seen from the travis machine to github.com [ERROR] Failed to execute goal on project tarantool-binding: Could not resolve dependencies for project com.yahoo.ycsb:tarantool-binding:jar:0.6.0-SNAPSHOT: Failed to collect dependencies at org.tarantool:connector:jar:1.6.1: Failed to read artifact descriptor for org.tarantool:connector:jar:1.6.1: Could not transfer artifact org.tarantool:connector:pom:1.6.1 from/to dgreenru-repo (http://dgreenru.github.com/repo/): Connect to dgreenru.github.com:80 [dgreenru.github.com/199.27.76.133] failed: Connection timed out -> [Help 1] Looks like either a transient problem, or problem with networking on the build machine. |
One more quick note: When viewing diffs for this change, it's best to view it on command line with git diff -w option, the web UI shows all diffs including white spaces. Since I made some minor styling changes that removed the Tabs and trailing white spaces and unnecessary empty lines, the diff becomes noisy if you view it with the white spaces included. The actual code diff is really as simple as this below:
The original code only has measure("READ", ist, st, en); and new code added the if-else check. |
@sffeng - I was thinking.... (Always dangerous!) If we change this to:
Then if we added a "DEFERRED" Status, for those bindings that support it, we would get metrics on operations that the client has not sent to the server yet and a different set on when they actually flush the operations to the server. Thoughts? Rob. |
@allanbank - measuring each error type individually is an interesting idea, although if we go down that route it might be best to have a configuration to turn that on, or default to consolidating error measurements, or even to just not report error measurements. My concern is that some bindings have several different error types that could get reported. With long running timeseries workloads you could possibly get buried in timeseries printouts if you hit several log types. |
I see valid points for both sides, so how about make latency tracking for any "non-OK" status configurable? that is, for any non-ok status we only track latency if it is configured via a property. The property can be a simple CSV of a list of non-ok status the user wants to track. I believe this should address Allan's use case where the user may want to track the latency for Status.Deferred meanwhile keep the output simple and concise for most users who don't really care about latency of non-OK operations. In terms of code, it will look something like this, still fairly simple: void init() { Then when each time we need to report latency, we do this: if (res == Status.OK) { |
@stfeng - I like the idea of making it configurable. Should there be a final else block with something like "READ-FAILED"? Rob |
Yes. I think there should be a final else block to catch all failed status. Thank you Rob and Andy for your inputs! I will make a follow-up commit on -Stanley On Sunday, November 22, 2015, allanbank [email protected] wrote:
|
So, I'm actually leaning to just a configuration to do one of three things:
Reason being, a philosophy I try to have is to make users have to sift through source as little as possible. Providing the user with the ability to specify specific error codes to track is definitely extremely flexible, but would require each module to document all of its possible error codes so people don't have to go searching for them in source. This is not impossible, just a lot more work, and more work to maintain. Thoughts? Maybe a combination of both? Sorry @stfeng making your simple PR a bit more complicated. |
@kruthar, Actually the latest proposed code change already does pretty much the same as you proposed above, and is quite simple for the user, let's take a look at the proposed code first: if (res == Status.OK) { "trackedNonOKStatus" is from configuration and by default is empty (i.e., user does not have to configure anything). So the default behavior is to report latency of all OK and all failed operations separately in two measurement names, which maps to your proposal 2) and 3). (If the user doesn't care about latency of failed operations, they can simply ignore that category). In practice, most users don't care about latency of failed operations. however, there are advanced users who care about latency of a specific (set of) failed operation(s), such as the latency of "deferred status" operations. For them, they can configure those status code in the "trackedNonOKStatus" property and we will report these configured status code separately. Such configuration is only required for users who care about a particular set of failed status code, and even for them, they do not need to list all possible status code a DB could return, they only need to configure the ones they already know about. I think the difference between your proposal and mine is you are proposing to have a mode that will report each and every error code separately. That could lead to unnecessary noise, but I think it's easy to add that mode here just in case it's useful to someone. One extra condition in the else-if clause would do the job: |
Could we make a special status for trackedNonOKStatus, e.g. "all", rather than introduce an additional flag? |
I also thought about introducing a special status ("all") but was worried it may conflict with a legitimate return status with the same name from a (future) DB binding. I believe "Status.*" should be used solely for the purpose of relaying status from DB binding to YCSB core. It seems to me an explicit flag makes the intention more clear here and the code more robust. However, if you guys feel strongly about reserving and using the "all" string as a special token for this purpose, just let me know, i am fine making the change that way too. Thanks. |
Folks, I've pushed a change that incorporates the latest discussions we had. I've tested it on my end and it works fine... From this point on, YCSB will no longer mix up latency of failed operations with those successful operations! ;) And, by default with no configuration required on the user's part, latency of failed operations will be aggregately reported in the "[OPERATION-Failed]" measurement name, or optionally, the user can choose to configure to record latency of failed operations in a finer grain with either: recording each error code separately, or pick and choose specific error codes they want to record separately. Anyway, the code speaks for it self, PTAL. ;) Thanks! |
@allanbank @kruthar @busbey - Could any one of you guys please take a look at this? The discussion is long but the code change is still pretty simple. ;) Thank you! |
@stfeng - We have separate issues and PRs for fixing checkstyle issues like spacing. I know it makes it harder to develop if line spacing is off, but when you mix a lot of format fixes with functionality fixes it makes it hard to evaluate the code, can you see if making just the functionality fixes related to this PR is possible? |
ah dang. my earlier review comment got eaten. I'll repost. Reviewers can look at teh total changes while ignoring whitespace with "?w=1" on the url. eg: https://github.com/brianfrankcooper/YCSB/pull/507/files?w=1 That said, can you please make this PR 2 commits, one with all the formatting fixes and then one with the functionality change? Both commits should use the "[core]" message prefix. |
// setting reportLatencyForEachError to true, or | ||
// 2) Record and report latency for a select set of error status codes by | ||
// providing a CSV list of Status codes via the "latencytrackederrors" | ||
// property. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this explanation needs to be in docs somewhere. probably the workload template and the Core Properties wiki page.
Given the current limitations of our github setup, could you add something to the workload template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Moved the workload_template.
Based on feedback above, I've separated out my original commits into two: First one is for coding style change only, and Then on top of these two commits, I made the 3rd commit that incorporated @busbey's CR comments. PTAL. |
Dear Gentleman, Please take me off of this thread -- I believe my userID got mixed up -- Thanks On Wed, Nov 25, 2015 at 7:56 AM, Stanley Feng [email protected]
|
Specifically: 1. Remove trailing spaces, 2. Replace TAB with spaces, 3. Break lines > 80 characters 4. Misc code alignment. No functional code change. use ?w=1 on the URL to view non-whitespace changes.
And make it configurable.
Sorry sffeng. I believe that's a typo. :) @allanbank looks like you referred to sffeng in one of your comments above, because only you can modify/edit your own comments, can you please change it so sffeng is no longer getting these updates? Thanks! (i am @stfeng ;) |
No problem, best wishes to your project! On Wed, Nov 25, 2015 at 8:14 AM, Stanley Feng [email protected]
|
- Move the doc (code comment) to workload_template - Minor changes in propery parsing logic and log to System.Err; [core] Incoporate CR comment Missed one comment before.
Rebased to master, and pushed my commits that incorporated the CR comments, including the request to separate the styling change from the functional change into different commits. PTAL. Thanks. |
I think sffeng needs to click on the "unsubscribe" button in the notifications area near the top of the page. |
looks fine by me. other folks? |
_db.init(); | ||
|
||
this.reportLatencyForEachError = Boolean.parseBoolean(getProperties(). | ||
getProperty("reportlatencyforeacherror", "false")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pull the property string and default values into variables? You can see this convention in CoreWorkload: https://github.com/brianfrankcooper/YCSB/blob/master/core/src/main/java/com/yahoo/ycsb/workloads/CoreWorkload.java#L110-L117
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Define constants for the property strings.
Done incorporating @kruthar's feedback. Thanks for the reviews! |
[core] Record latency of successful and failed operations separately
Thanks! Note to next RM, call this out as incompatible. |
…fixes [core] Record latency of successful and failed operations separately
…fixes [core] Record latency of successful and failed operations separately
Plus some minor styling fixes: break long lines, replace tabs with white spaces.