Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize date_histograms across daylight savings time #55559

Merged
merged 46 commits into from
May 7, 2020
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
c230cf8
Take that Rounding!
nik9000 Mar 31, 2020
971b473
This time for sure
nik9000 Apr 18, 2020
41a7699
Fix quarter of year
nik9000 Apr 18, 2020
308a458
Benchmark
nik9000 Apr 18, 2020
b33d546
Try
nik9000 Apr 19, 2020
bd18c2a
WIP
nik9000 Apr 20, 2020
35c6ef4
Moar
nik9000 Apr 20, 2020
bb01ea9
All
nik9000 Apr 20, 2020
acf1971
Check this
nik9000 Apr 20, 2020
e242a3d
UTC?
nik9000 Apr 20, 2020
8ef7fb7
WIP
nik9000 Apr 21, 2020
d65b447
WIP
nik9000 Apr 21, 2020
b8eefda
Spotless
nik9000 Apr 21, 2020
27089f0
Better!
nik9000 Apr 21, 2020
0951db5
Words
nik9000 Apr 21, 2020
2a24404
JMH
nik9000 Apr 21, 2020
c99ea89
JMH2
nik9000 Apr 21, 2020
cf5c5e9
Words
nik9000 Apr 21, 2020
6be2d06
Merge branch 'master' into tz_round
nik9000 Apr 21, 2020
49a60e5
Revert extra
nik9000 Apr 21, 2020
706007e
Words
nik9000 Apr 21, 2020
1185b46
Unused import
nik9000 Apr 21, 2020
3a6b46e
Explain
nik9000 Apr 21, 2020
9ccecd7
Checkstyle
nik9000 Apr 21, 2020
587605d
Default forks?
nik9000 Apr 22, 2020
8fbeb0b
More clear
nik9000 Apr 22, 2020
dcc3b5b
Preserve test without searchable field
nik9000 Apr 22, 2020
f646f23
Prepare rounding in factory
nik9000 Apr 22, 2020
587a6cd
Comment on why optimization can't work here
nik9000 Apr 22, 2020
2223966
line length
nik9000 Apr 22, 2020
c073501
Different counts
nik9000 Apr 22, 2020
b53b59d
Private!
nik9000 Apr 22, 2020
5ba5dc7
Tests in the future!
nik9000 Apr 22, 2020
dd0d16e
Nothing is sensible
nik9000 Apr 22, 2020
c8fcdd0
Merge branch 'master' into tz_round
nik9000 Apr 22, 2020
b3538a1
Merge branch 'master' into tz_round
nik9000 Apr 23, 2020
675fc07
abstract
nik9000 Apr 23, 2020
28a10ab
tMerge branch 'master' into tz_round
nik9000 Apr 24, 2020
ede9cbc
Merge branch 'master' into tz_round
nik9000 Apr 28, 2020
8e52138
Merge branch 'master' into tz_round
nik9000 Apr 28, 2020
44424ea
Fix merge mistake
nik9000 Apr 28, 2020
83e9ecc
Merge branch 'master' into tz_round
nik9000 Apr 29, 2020
7551cbc
Merge branch 'master' into tz_round
nik9000 Apr 29, 2020
986afab
Merge branch 'master' into tz_round
nik9000 May 6, 2020
9d5c1c5
Blackhole
nik9000 May 6, 2020
83e72ed
Merge branch 'master' into tz_round
nik9000 May 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,9 @@ To get realistic results, you should exercise care when running benchmarks. Here
`performance` CPU governor.
* Vary the problem input size with `@Param`.
* Use the integrated profilers in JMH to dig deeper if benchmark results to not match your hypotheses:
* Run the generated uberjar directly and use `-prof gc` to check whether the garbage collector runs during a microbenchmarks and skews
* Add `-prof gc` to the options to check whether the garbage collector runs during a microbenchmarks and skews
your results. If so, try to force a GC between runs (`-gc true`) but watch out for the caveats.
* Use `-prof perf` or `-prof perfasm` (both only available on Linux) to see hotspots.
* Add `-prof perf` or `-prof perfasm` (both only available on Linux) to see hotspots.
* Have your benchmarks peer-reviewed.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmitterdorfer , these are for you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes make sense to me!


### Don't
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.common;

import org.elasticsearch.common.time.DateFormatter;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;

import java.time.ZoneId;
import java.util.concurrent.TimeUnit;
import java.util.function.Supplier;

@Fork(2)
@Warmup(iterations = 10)
@Measurement(iterations = 5)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nanoseconds don't seem handy for the output and I wonder whether it makes sense to switch to microseconds here? Especially when count is set to 1000000 the score is quite high.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for reviewing so quickly! Some of this stuff I just copied from another benchmark to get started and never thought to play with.

The count is interesting - it is nice to be able to see a count of 1, just to know how much overhead I'm adding on to one off rounds. So NANOSECONDS is sort of a better report for those rounds. But I'll play with bumping it!

@State(Scope.Benchmark)
public class RoundingBenchmark {
private static final DateFormatter FORMATTER = DateFormatter.forPattern("date_optional_time");

@Param({
"2000-01-01 to 2020-01-01", // A super long range
"2000-10-01 to 2000-11-01", // A whole month which is pretty believable
"2000-10-29 to 2000-10-30", // A date right around daylight savings time.
"2000-06-01 to 2000-06-02" // A date fully in one time zone. Should be much faster than above.
})
public String range;

@Param({ "java time", "es" })
public String rounder;

@Param({ "UTC", "America/New_York" })
public String zone;

@Param({ "MONTH_OF_YEAR", "HOUR_OF_DAY" })
public String timeUnit;

@Param({ "1", "10000", "1000000", "100000000" })
public int count;

private long min;
private long max;
private long[] dates;
private Supplier<Rounding.Prepared> rounderBuilder;

@Setup
public void buildDates() {
String[] r = range.split(" to ");
min = FORMATTER.parseMillis(r[0]);
max = FORMATTER.parseMillis(r[1]);
dates = new long[count];
long date = min;
long diff = (max - min) / dates.length;
for (int i = 0; i < dates.length; i++) {
if (date >= max) {
throw new IllegalStateException("made a bad date [" + date + "]");
}
dates[i] = date;
date += diff;
}
Rounding rounding = Rounding.builder(Rounding.DateTimeUnit.valueOf(timeUnit)).timeZone(ZoneId.of(zone)).build();
switch (rounder) {
case "java time":
rounderBuilder = rounding::prepareJavaTime;
break;
case "es":
rounderBuilder = () -> rounding.prepare(min, max);
break;
default:
throw new IllegalArgumentException("Expectd rounder to be [java time] or [es]");
}
}

@Benchmark
public long round() {
long sum = 0;
Rounding.Prepared rounder = rounderBuilder.get();
for (int i = 0; i < dates.length; i++) {
sum += rounder.round(dates[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safer here to sink it into a blackhole instead to reduce the potential for compiler tricks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with that and found that it added pretty significant overhead, smearing out some of differences. When I removed it the differences became more "sharp". I think this sort of thing is a safe way to work around the compiler tricks. -prof=perfasm seems to bear that out, but computers are tricky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that with the current code it might be able to do loop unrolling (but then: I did not have a look at the assembler code yet).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first instinct is that loop unrolling wouldn't bother me too much. I'm not really trying to measure the iteration, just the round calls. If loop unrolling let it optimize the round in fun ways that might make it less valid. This is called in a fairly tight loop in production too. Not this tight. And not one the JVM would be able to unroll though.....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the microbenchmarks again once with your implementation and once with a cheaper version of Blackhole (calling a method that is not inlined) and there are cases where the score is vastly different, e.g.:

sum (your approach):

Benchmark                                                               (count)                   (range)  (rounder)     (timeUnit)            (zone)  Mode  Cnt            Score            Error   Units
RoundingBenchmark.round                                               100000000  2000-06-01 to 2000-06-02  java time    HOUR_OF_DAY               UTC  avgt   10   5468296772.000 ±   13822182.335   ns/op
RoundingBenchmark.round                                               100000000  2000-06-01 to 2000-06-02         es    HOUR_OF_DAY               UTC  avgt   10    949419080.100 ±    2728670.015   ns/op
RoundingBenchmark.round                                               100000000  2000-06-01 to 2000-06-02  java time    HOUR_OF_DAY  America/New_York  avgt   10  13193144584.600 ±   86152570.014   ns/op
RoundingBenchmark.round                                               100000000  2000-06-01 to 2000-06-02         es    HOUR_OF_DAY  America/New_York  avgt   10    947821052.350 ±    6119635.225   ns/op

don't inline:

Benchmark                                                               (count)                   (range)  (rounder)     (timeUnit)            (zone)  Mode  Cnt            Score            Error   Units
RoundingBenchmark.round                                               100000000  2000-06-01 to 2000-06-02  java time    HOUR_OF_DAY               UTC  avgt   10   5559418450.700 ±    3268596.659   ns/op
RoundingBenchmark.round                                               100000000  2000-06-01 to 2000-06-02         es    HOUR_OF_DAY               UTC  avgt   10   1363302403.000 ±   16780559.912   ns/op
RoundingBenchmark.round                                               100000000  2000-06-01 to 2000-06-02  java time    HOUR_OF_DAY  America/New_York  avgt   10  13554514668.900 ±   25928733.412   ns/op
RoundingBenchmark.round                                               100000000  2000-06-01 to 2000-06-02         es    HOUR_OF_DAY  America/New_York  avgt   10   1354972403.700 ±   15574272.765   ns/op

This means that in some cases your microbenchmark implementation shows numbers that are e.g. in the last case in the table above ~ 1.4 times faster than they are if we don't inline which is more likely in production code. We do see a significant speedup compared to the original implementation in both microbenchmark approaches anyway so I think the actual change makes sense. It's just that the change is probably slightly less pronounced in practice than the microbenchmarks indicate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blackhole it is then! I'll make the change. I had thought that the synchronization overhead that Blackhole adds was sort of unrealistic. And it might be, but it is more realistic than sum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/synchronization/volatile/

}
return sum;
}

@Benchmark
public long nextRoundingValue() {
long sum = 0;
Rounding.Prepared rounder = rounderBuilder.get();
for (int i = 0; i < dates.length; i++) {
sum += rounder.nextRoundingValue(dates[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safer here to sink it into a blackhole instead to reduce the potential for compiler tricks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with @danielmitterdorfer about this. The blackhole adds quite a bit of overhead for such fast methods. Something like 10%. Which makes it harder to see things make the thing faster. On the other hand, the JVM can totally unroll this loop. And in production it (mostly) can't because these will be coming from Lucene.

}
return sum;
}
}
Loading