-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: decrease the memory usage of hashTable in HashJoinExec #11832
Conversation
Do we have any memory benchmark for this? |
Codecov Report
@@ Coverage Diff @@
## master #11832 +/- ##
===========================================
Coverage 81.4279% 81.4279%
===========================================
Files 442 442
Lines 95186 95186
===========================================
Hits 77508 77508
Misses 12216 12216
Partials 5462 5462 |
/run-all-tests |
benchmark resultTLDR
go test -run=XX -bench="BenchmarkBuildHashTableForList|BenchmarkHashJoinExec" -test.benchmem -count 10
|
/run-all-tests |
9f43098
to
56f17e6
Compare
/run-all-tests |
executor/index_lookup_join.go
102: lookupMap *mvmap.MVMap
368: lookupMap: mvmap.NewMVMap(),
expression/aggregation/util.go
26: existingKeys *mvmap.MVMap
35: existingKeys: mvmap.NewMVMap(),
It seems that these two usages can be optimized, too. Maybe we can create an issue for it? |
executor/join.go
Outdated
@@ -50,7 +50,7 @@ type HashJoinExec struct { | |||
|
|||
// concurrency is the number of partition, build and join workers. | |||
concurrency uint | |||
hashTable *mvmap.MVMap | |||
hashTable rowHashTable | |||
innerFinished chan error | |||
hashJoinBuffers []*hashJoinBuffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this field can be removed.
executor/join.go
Outdated
} | ||
|
||
func (e *HashJoinExec) matchJoinKey(inner, outer chunk.Row) bool { | ||
innerAllTypes := retTypes(e.innerExec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my part, innerAllTypes, outerAllTypes := xxx
, innerIdx, outerIdx := xxx
, innerTp, outerTp := xxx
are more clearer?
util/chunk/column.go
Outdated
elemLen := len(c.elemBuf) | ||
data = c.data[rowID*elemLen : rowID*elemLen+elemLen] | ||
} else { | ||
start, end := c.offsets[rowID], c.offsets[rowID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it seem to be wrong that start == end
, any test case covered this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark was fake because of this bug. 😢
I'll update a new one.
util/codec/codec.go
Outdated
return nil, errors.Trace(err) | ||
flag := varintFlag | ||
if mysql.HasUnsignedFlag(allTypes[i].Flag) { | ||
if integer := row.GetInt64(i); integer < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if integer := row.GetInt64(i); integer < 0 { | |
if row.GetInt64(i) < 0 { |
util/codec/codec.go
Outdated
default: | ||
return nil, errors.Errorf("unsupport column type for encode %d", allTypes[i].Tp) | ||
_, err = row.WriteTo(i, w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dangerous to skip the type interpretation and hash the raw data directly in the table building stage.
For example:
create table t1(x float);
create table t2(x double);
insert into t1 values (1);
insert into t2 values (1);
select * from t1, t2 where t1.x=t2.x;
An empty set is returned in this PR since it doesn't take data types into account when building the inner hash table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4891038
to
afe7881
Compare
/run-all-tests |
/run-all-tests |
4386e72
to
134780a
Compare
@XuHuaiyu got two LGTMs, PTAL again~ |
} | ||
|
||
// Put puts the key/rowPtr pairs to the rowHashMap, multiple rowPtrs are stored in a list. | ||
func (m *rowHashMap) Put(hashKey uint64, rowPtr chunk.RowPtr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the origin rows are all stored in a Chunk but not a List, we need to pass chunk.RowPtr{0, rowIdx}
into this function?
At least for now, but we can bench and optimize it later.
… On Aug 29, 2019, at 2:08 PM, HuaiyuXu ***@***.***> wrote:
@XuHuaiyu commented on this pull request.
In executor/hash_table.go <#11832 (comment)>:
> + length int
+}
+
+// newRowHashMap creates a new rowHashMap.
+func newRowHashMap() *rowHashMap {
+ m := new(rowHashMap)
+ // TODO(fengliyuan): initialize the size of map from the estimated row count for better performance.
+ m.hashTable = make(map[uint64]entryAddr)
+ m.entryStore.slices = [][]entry{make([]entry, 0, 64)}
+ // Reserve the first empty entry, so entryAddr{} can represent nullEntryAddr.
+ m.entryStore.put(entry{})
+ return m
+}
+
+// Put puts the key/rowPtr pairs to the rowHashMap, multiple rowPtrs are stored in a list.
+func (m *rowHashMap) Put(hashKey uint64, rowPtr chunk.RowPtr) {
If the origin rows are all stored in a Chunk but not a List, we need to pass chunk.RowPtr{0, rowIdx} into this function?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#11832?email_source=notifications&email_token=AAGTYNNX6DGJDQTERXDI4ETQG5RXDA5CNFSM4IOSJJNKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCDBVAQQ#pullrequestreview-281235522>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAGTYNPU6HUPAERWWLDFAODQG5RXDANCNFSM4IOSJJNA>.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
@SunRunAway merge failed. |
/run-common-test |
/bench |
What problem does this PR solve?
part of #11607
What is changed and how it works?
See section 1 of #11607 (comment)
See benchmark at #11832 (comment)
Check List
Tests
Code changes
Side effects
Related changes