nsqd: optimize guid generation #663

ploxiln · 2015-09-29T23:56:39Z

Humble idea inspired by #658

Current-nanosecond-time and 64-bit division operations
appear to be slow on some ARM systems, so minimize them:

share time.Time from each loop of idPump() with NewGUID()
avoid adding a second to lastError on each iteration,
by making it nextErrorLog instead
replace division by 1000000 with a bitshift
equivalent to division by 1048576

My benchmarks on darwin / x86_64 laptop (which will depend on temperature and other factors of course) show a negligible improvement, but it will be interesting to see the results for an ARM system (I have a raspberry pi I could dig out and try, maybe later).

before:

PUB: [bench_writer] 2015/09/29 19:52:57 duration: 10.000785444s - 42.458mb/s - 222602.516ops/s - 4.492us/op
SUB: [bench_reader] 2015/09/29 19:53:07 duration: 10.01740169s - 21.631mb/s - 113407.751ops/s - 8.818us/op

after:

PUB: [bench_writer] 2015/09/29 19:41:31 duration: 10.001047234s - 43.300mb/s - 227016.226ops/s - 4.405us/op
SUB: [bench_reader] 2015/09/29 19:41:41 duration: 10.023406992s - 23.684mb/s - 124172.051ops/s - 8.053us/op

ploxiln · 2015-09-30T00:10:01Z

btw test failure is just go1.4.2 on i386, TestEphemeralTopicsAndChannels got non-zero number of topics at the end

ploxiln · 2015-09-30T00:34:39Z

OK, I feel pretty silly now for not seemingly not reading idPump() - this won't make much of a difference if there isn't a constant stream of errors from NewGUID() (well, except for the int64 division trick)

mreiferson · 2015-09-30T17:13:42Z

@ploxiln any of this worth landing? I'm a little concerned about micro-optimizations like bit-shift-for-division, but maybe that one?

ploxiln · 2015-09-30T17:21:10Z

I'm not sure this should be merged as-is, I expect we'll want to tweak it. We may not need the subjective uglyness of the additional argument to NewGUID() - it seems that errors shouldn't be that common ...

I like the shift optimization - it makes sense that it could be noticeably helpful on ARM, I'll check.

mreiferson · 2015-09-30T20:51:00Z

Let's drop the time stuff and leave the shift optimization and land it.

mreiferson · 2015-09-30T20:51:05Z

Thanks @ploxiln

ploxiln · 2015-10-01T02:04:30Z

I've reduced this PR to just the shift optimization. But I haven't tested it on arm yet, and I don't think there's any rush.

I'm in the process of getting stuff set up on a raspberrypi 2 which has 4 Cortex A7 cores, which are lower-power and in-order rather than out-of-order but from the same generation as the Cortex A15. And it's also an interesting target in any case.

ploxiln · 2015-10-01T02:12:16Z

It's probably worth mentioning that the time-part of the id will jump back 18577 hours. I don't think there's much chance of overlap; that's about as long as nsq has existed.

func main() {
    ns := time.Now().UnixNano()
    ms := ns / 1e6
    pms := ns >> 20
    fmt.Printf("milliseconds = %d\n", ms);
    fmt.Printf("pseudo-milliseconds = %d\n", pms);
    fmt.Printf("difference = %d\n", ms - pms);
    fmt.Printf("hours = %d\n", (ms - pms) / (1000 * 3600));
}
[pierce@plo-pro current]$ go run test.go
milliseconds = 1443665428226
pseudo-milliseconds = 1376786640383
difference = 66878787843
hours = 18577

ploxiln · 2015-10-01T03:09:39Z

(maybe I should adjust twepoch?)

ploxiln · 2015-10-01T05:11:18Z

Trying to use the ubuntu-packaged go1.3.3 on arm, the build fails with "undefined: atomic.Value". Accidental compatibility slip, or just an arm limitation?

Anyway, continuing with testing with a binary build of go 1.4.2 for arm I got from http://dave.cheney.net/unofficial-arm-tarballs (seems legit, I even think I've heard of this guy...). I may be able to build and test go 1.5.1 but that won't be until late tomorrow if I do.

ploxiln · 2015-10-01T06:04:05Z

Did have a noticeable improvement for pub:

Switched to branch 'master'
# using --mem-queue-size=1000000 --data-path= --size=200 --batch-size=200
PUB: [bench_writer] 2015/10/01 01:25:05 duration: 10.004711496s - 5.838mb/s - 30605.580ops/s - 32.674us/op
SUB: [bench_reader] 2015/10/01 01:25:15 duration: 10.016315311s - 1.698mb/s - 8900.379ops/s - 112.355us/op
Switched to branch 'time_div_optim'
# using --mem-queue-size=1000000 --data-path= --size=200 --batch-size=200
PUB: [bench_writer] 2015/10/01 01:26:25 duration: 10.004926446s - 6.470mb/s - 33923.288ops/s - 29.478us/op
SUB: [bench_reader] 2015/10/01 01:26:35 duration: 10.019001199s - 1.698mb/s - 8904.979ops/s - 112.297us/op
Switched to branch 'master'
# using --mem-queue-size=1000000 --data-path= --size=200 --batch-size=200
PUB: [bench_writer] 2015/10/01 01:27:31 duration: 10.004017781s - 5.941mb/s - 31147.486ops/s - 32.105us/op
SUB: [bench_reader] 2015/10/01 01:27:41 duration: 10.025512223s - 1.694mb/s - 8879.945ops/s - 112.613us/op
Switched to branch 'time_div_optim'
# using --mem-queue-size=1000000 --data-path= --size=200 --batch-size=200
PUB: [bench_writer] 2015/10/01 01:28:36 duration: 10.004349052s - 6.558mb/s - 34385.046ops/s - 29.082us/op
SUB: [bench_reader] 2015/10/01 01:28:46 duration: 10.015671236s - 1.712mb/s - 8977.831ops/s - 111.385us/op

Also: many ARM SoC have multiple cores, and GOMAXPROCS=2 almost eliminates the difference this branch makes, for this simple bench.sh (pretty consistent pub perf of 10.5 mb/s -> 10.8 mb/s)

ploxiln · 2015-10-01T18:39:45Z

There were a few options for twepoch, I decided to keep it almost exactly the same but update the comment.

I've done as much half-hearted ARMv7 benchmarking as I'm inclined to do. fwiw

64-bit division operations appear to be slow on some ARM systems, so replace division by 1000000 with a bitshift equivalent to division by 1048576. adapt guid twepoch comment for revised guid timestamp scheme

mreiferson · 2015-10-01T19:02:55Z

@ploxiln correct me if I'm wrong, but won't this also change the theoretical number of unique IDs we're able to generate per second?

ploxiln · 2015-10-01T19:21:58Z

Yes, there will be 5% fewer unique IDs per second.

>>> (10**9 / 2**20) * 1.0 / 1e3 
0.953

ploxiln · 2015-10-01T21:26:52Z

So, it reduces the theoretical max messages per workerID per second from 4,096,000 to 3,903,488

nsqd: optimize guid generation

ploxiln mentioned this pull request Sep 29, 2015

nsqd: cpu usage on ARM #658

Closed

mreiferson added the perf label Sep 30, 2015

ploxiln force-pushed the time_div_optim branch from 7050423 to e57ab2e Compare October 1, 2015 01:29

nsqd: optimize NewGUID() division for "milliseconds"

8cf036a

64-bit division operations appear to be slow on some ARM systems, so replace division by 1000000 with a bitshift equivalent to division by 1048576. adapt guid twepoch comment for revised guid timestamp scheme

ploxiln force-pushed the time_div_optim branch from 0e329a5 to 8cf036a Compare October 1, 2015 18:52

mreiferson added a commit that referenced this pull request Oct 1, 2015

Merge pull request #663 from ploxiln/time_div_optim

ba7c4b4

nsqd: optimize guid generation

mreiferson merged commit ba7c4b4 into nsqio:master Oct 1, 2015

ploxiln deleted the time_div_optim branch April 22, 2016 22:33

ploxiln mentioned this pull request Jan 18, 2019

nsqd: stops sending messages after moving time backward #1124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsqd: optimize guid generation #663

nsqd: optimize guid generation #663

ploxiln commented Sep 29, 2015

ploxiln commented Sep 30, 2015

ploxiln commented Sep 30, 2015

mreiferson commented Sep 30, 2015

ploxiln commented Sep 30, 2015

mreiferson commented Sep 30, 2015

mreiferson commented Sep 30, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

mreiferson commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

nsqd: optimize guid generation #663

nsqd: optimize guid generation #663

Conversation

ploxiln commented Sep 29, 2015

ploxiln commented Sep 30, 2015

ploxiln commented Sep 30, 2015

mreiferson commented Sep 30, 2015

ploxiln commented Sep 30, 2015

mreiferson commented Sep 30, 2015

mreiferson commented Sep 30, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015

mreiferson commented Oct 1, 2015

ploxiln commented Oct 1, 2015

ploxiln commented Oct 1, 2015